

# IMPLEMENTATION OF POWER EFFICIENT APPROXIMATE 2-D DCT ARCHITECTURE REALIZATION USING ADDER COMPRESSOR

# N.VENKATA PRADEEP<sup>1</sup>, K.BALA<sup>2</sup>

<sup>1</sup>PG Student, Dept of ECE (VLSISD), SITS, Kadapa, AP, India. <sup>2</sup>Associate Professor & HOD, Dept of ECE, SITS, Kadapa, AP, India.

Abstract- The proposed DCT is used for efficient adder compressors for the powerefficient. The DCT is an approximation of the cosine function, whose resultant matrix is only composed of 0 and 1 values. Therefore, the DCT can be easily implemented using only adders and subtractors rather than general purpose multipliers. In this effort we use combinations of efficient 4-2 and 8-2 adder compressors for the state-of the approximate DCT implementations. They are compare with both approximate DCTs with the adder operator from the device. A synthesis methodology enables the power analysis with vectors of exact images. Results show that both approximate DCTs using adder compressors are able to reduce both power, and area, when compared with the similar DCT architectures with the (+) operator. As the entire 2-D DCT is implement by using 2 stages of 1 D DCT, the total of twelve and ten 4-2 adder compressors are used correspondingly. The approximate DCT solutions, with adder compressors, minimizes both cells area and power consumption with good overall quality images.

IJMTARC - VOLUME - VI - ISSUE - 24, OCT-DEC, 2018

## I. INTRODUCTION

The compression of digital images is needed because great amount of data that must be stored and transferred. The discrete transforms most often used for image compression are DWT and Discrete Cosine Transform (DCT). Though, the DCT is the chosen choice for some applications such as the new High Efficient Video Coding Standard (HEVC). To the realization of discrete transforms represents a challenge to academe and industry due to the infinite computational effort contained in the calculations. As a result in systems operating on real time efficient implementations of this module is required. VLSI stands for "Very Large Scale Integrated Circuits". It's a classification of ICs. An IC of common VLSI includes about millions active devices. Typical functions of VLSI include Memories, computers, and signal processors, etc. A semiconductor development technology is a process by which working circuit can manufactured by designed specifications. There are many such technologies, each of which creates a different environment or style of design. In integrated circuit design, the specifications consist of polygons of conducting and semiconducting material that will be covered on top of each other to construct a working chip. When a chip is custom-designed for a exact use is called an ASIC (application-specific integrated circuit).Printed-circuit (PC) devise can and results in specific spot of perform materials as they will emerge on a path board. In totaling, PC propose cumulative the vastness of the electronic motion into customary IC package, the spot and interconnection of which were crucial to the closing circuit. Printed circuit might be easier to restore than included track is but it is slower, more expensive, and , less compact unable to take advent of focused silicon layout structures that craft VLSI systems so gorgeous.

## **OBJECTIVE**

The major contribution of project presented the power-efficient state-of-the-art approximate DCTs with capable of adder compressors. The approximate DCT realize exploited arrangement of 4-2 & 8-2 adder compressors. They were compare with both approximate DCTs with the adder operator from the device. A synthesis methodology enable the power analysis with vectors of exact images. Results show that both approximate DCTs with adder compressors were able to reduce both power, and area, when compared with the similar DCT design with the (+) operator. As the entire 2-D DCT is implement by using 2 stages of 1 D DCT, the total of twelve and ten 4-2 adder compressors are used correspondingly.

#### Drawbacks:

Multiplication requirement is more.





#### ISSN: 2320-1363

- More delay
- High Power

The approximate DCT structure implement in Verilog and in attendance an situation whose synthesis reports were based on a locate of exact images as input vectors to obtain applicable power results. The outcome shows that the approximate DCT hardware solutions to reduce both power consumption, and area with overall good quality images

#### **II. DISCRETE COSINE TRANSFORM**

The DCT observed as the greatest approximation to the Karhunen Love Transform which is the optimal transform commencing the perspective of the correlation of the signal. The DCT in a simplify method de correlates the signal among close samples maintenance only the essential information in the higher missing corner of a reaction matrics, however with a less computational exertion while evaluate to the KLT. The DCT & its inverse IDCT Matrix were definite by Equations 1 and 2, respectively. The resulting matrix of the DCT coefficients demands numerous floating point multiplications which outcome in a elevated computational effort, creation the system slow and difficult high power consumption.

i,j=1,2,3,4,....N

# A. APPROXIMATE DISCRETE COSINE TRANSFORMS

The effort accessible in initiate a novel algorithm, called Rounded Cosine Transform difficult to reduce the computational exertion of the DCT with least contact on the image feature. This novel algorithm is an estimate to the cosine function which mostly round each DCT multiplication matrices time to the nearer integer. within the RCT the estimated cosine matrices used only values -1, 0, and 1 trivial multipliers, that can be execute using only adders/subtractors. The RCT resolution of uses 22 adder operands.

# **B. METHODOLOGY FOR THE IMAGE COMPRESSION**

The steps of the image compression consent to in this exertion is accessible in Fig. 1 where the implementations in software (using Matlab) and hardware are highlighted in gray and red, correspondingly



Fig.1. 8 X 8 Transposition buffer for common 2-D transform by one input data line/cycle

The DCT characterize a 2-D DCT which is composed of 2 1-D transforms & a transposition buffer. In this effort, we accept the transposition buffer based on two 8 X 8 buffers as offered in Fig.1 Whereas the buffer A stores the result from the 1<sup>st</sup> stage of the 1-D transform, the buffer B provide the earlier outcome to the 2<sup>nd</sup> stage of the 1-D transform. The 2-D DCT be also realize with adder compressors.



Fig.2. Image compression procedure used in this effort: implementations in software gray and hardware red.

The function of this section is to present an summary of compression in exacting an image compression by DCT. The dissimilar kind of construction to discover the 2-Dimensional DCT and the attitude go after for image compression apply DCT

#### C. PRINCIPLE BEHIND COMPRESSION

The quantity of bits essential to characterize the information in an image can be minimize by eliminate the idleness near in it. There are 3 types of redundancies: (1) spatial redundancy,





it is due to the relationship or addiction between adjacent pixel ethics; (2) spectral redundancy, it is owed to the connection among dissimilar spectral bands or color planes. (3) Temporal redundancy, it is nearby since of correlation among dissimilar frames on images. Image compression examine try to decrease the no of bits essential to characterize an image by eliminate the spatial & spectral redundancies while greatly as feasible.

#### **1. TYPES OF JPEG COMPRESSION**

At rest image regulations is an central purpose of data compression. while an analogue picture or image is digitized both pixel is symbolize by a preset integer of bits which be in contact to a firm quantity of gray levels. In this uncompressed arrangement, the digitized image necessitate a fat quantity of bits to be store or convey. As a answer, compression suit compulsory down to the restricted communication bandwidth or cargo space size. The JPEG standard allow for mutually lossless and loss encoding of at rest images.

#### 2. ADDER COMPRESSORS OVERVIEW

The necessary arrangement of the 4 to 2 adder compressor, while the combination of 4:2 adder compressors to contact an 8:2 adder compressor were accessible in this part.

#### I. 4 to 2 Adder Compressor

The 4-2 adder compressor have 5 inputs and 3 outputs, where the 4 inputs (X1, X2, X3, X4), & the output Sum contain the identical power. On the additional supply, the outputs Carry & Cout include one bit higher order. Individual significant distinctive in this compressor, was the liberty of the Cout (output carry) connected to the Cin (input carry). The ultimate addition operation S of the 4:2 adder compressor is set

To construct the re mixture of sum and Carry it is used a cascade of full-adders and halfadder circuits in a Ripple Carry structure

S = Sum + 2(Cout + Carry)



Fig.3. Mux-based cellular structure of 4-2 Adder Compressor

## II. 8-2 ADDER COMPRESSORS

The hierarchical arrangement of an 8 to 2 adder compressor base on 4 to 2 compressors, is accessible in Fig.3. The 8-2 adder compressor is calm of 13 inputs (8 main inputs & 5  $C_{in}$ ) and 7 outputs (1 Sum, 1 Carry & 5 Cout terms). While extra adder compressors be projected in text, such as 3:2, 5:2, and 7:2, the arrangement utilize in this work provide the greatest adjustment for the approximate DCT implementations.

ISSN: 2320-1363

#### Field-Programmable Gate Array (FPGA)

A FPGA is an FPD feature has specific constructions that allow very elevated logic capacity. While CPLD aspect logic resource with a ample numeral of inputs are AND planes, FPGAs proffer extra tapered logic resources. FPGAs also often a upper ratio of flip-flops to logical resources than do CPLDs. FPGA is a kind of CPLD turned inside out. The logic is divided into several number programmable logic blocks that were of individually smaller than a PLD. They were distribute across the whole chip in a sea of programmable interconnections and the entire array is bordered by programmable I/O blocks. An FPGA's programmable logic block is low capable than PLD, but FPGA chip contains a lot more logic blocks than CPLD of the same die size has PLDs.

# Development of Programmable Logic Devices

The first type of user-programmable chip can be employ logic circuits are the (PROM), in which address lines can be used as logic circuit inputs and data lines as outputs. Logic functions, however, seldom entail more than a little product terms & a PROM consist a full decoder for its address inputs. PROMS were an inefficient construction for realize logic circuits, & they were rarely in practice for that purpose. The first machine developed a specifically for implement logic circuit was the FPLA, or simply PLA for short. A PLA consisting of two levels of logic gates a programmable wired AND-plane follows by a programmable wired OR-plane. A PLA is ordered in its inputs can be AND together in the ANDplane; each AND-plane output can thus communicate to any product expression of the inputs. alike, each OR plane output can configured to manufacture the logical sum of the AND-plane outputs.

#### **III. SIMULATION RESULTS**

A. RTL SCHEMATIC FOR 1-D DCT ARCHITECTURE





Fig.4. RTL Schematic for 1-D DCT architecture

The hierarchical structure of 1-D Approximate DCT architectures with 8-2, and 4-2 adder compressors is presented in Fig. The 1-D DCT architecture is composed of thirteen inputs (eight primary inputs and five carry inputs) and two outputs ( $S_0$  and  $S_4$ )

#### 6.4.2 TECHNOLOGICAL SCHEMATIC



ISSN: 2320-1363

Fig.5. Technological Schematic

The hierarchical structure of 1-D DCT uses four 4-2 adder compressors, two inverter and two full adders. This architecture is composed of thirteen inputs (eight primary inputs and five carry inputs) and two outputs ( $S_0$  and  $S_4$ ).







#### ISSN: 2320-1363

IJMTARC – VOLUME – VI – ISSUE – 24, OCT-DEC, 2018

Fig.6. 1-D DCT architecture shows a technology schematic uses a LUT and buffer units.

Architecture and VHDL design of 2-D DCT, combined with quantization and zig-zag arrangement, is described. The architecture is used in JPEG image compression. The output of DCT module needs to be multiplied with post-scaler value to get the real DCT coefficients. Postscaling process is done together with quantization process. 2-D DCT is computed by combining two 1-D DCT that connected by a transpose buffer. This design aimed to be implemented in cheap Spartan 3E XC3S500 FPGA. The 2-D DCT architecture uses 3174 gates, 1145 Slices and 11 multipliers of one Xilinx Spartan-3E XC3S500E FPGA and reaches an operating frequency of 84.81 MHz. One input block with 8 x 8 elements of 8 bits each is processed in 2470 ns and pipeline latency is 123 clock cycles

## B. SIMULATION RESULTS OF 1-D DCT



Fig.7. simulation results of 1-D DCT

Here the inputs values of  $a_0$  to  $a_7$  is 10101010, and  $c_{in0}$  to  $c_{in5}$  is 101111. The outputs of  $S_0 = 1$  and  $S_4 = 0$ . 1-D DCT ARCHITECTURE The architecture of 1-D DCT is implemented from

the butterfly as shown in Figure. This architecture consist of eight 18-bit register, two 18-bit multiplexers (8x1), one 18-bit multiplexer (2x1), one floating point adder (18-bit), one floating point subtractor (18-bit), one floating point multiplier (18-bit), six 18-bit latch and a controller. The design considerations made in this architecture are: Exploiting parallelism, pipelining and reusability concept is incorporated so that area minimization can be achieved considerably. The sequence of operation carried over in this architecture are: Initially the 8 inputs  $[x (0), x (1), \dots, x (7)]$  i.e., the pixel image values are serially loaded into eight 18-bit registers. All the outputs of registers are given to the multiplexer1 and multiplexer2. The register values are selected depending upon the control that is given by the controller. The controller is designed using a Finite State Machine (FSM) to control the overall operation of the architecture



Fig.8. Simulation Result of Full adder

The DTT represents a discrete class of the Chebyshev polynomials, and it is an alternative for the common DCT (Discrete Cosine Transform), which is present in several compression systems. High energy compaction and decorrelation are indicated as main properties of this transform. The approximate DTT performance combined with its lower computational effort makes this transform an excellent choice to be applied to dedicated hardware for image compression. As the resultant matrix of the state-of-the-art approximate DTT



presents only 0, 1, -1, 2, and -2 values, thus it can be easily implemented in hardware using only adders and subtractors rather than general purpose multipliers. In this work we use combinations of efficient 4-2, 6-2, and 8-2 adder compressors for the state-of-the approximate DTT implementations. We present an environment for the synthesis of the DTT in Cadence Encounter RTL Compiler tool



Fig.9. Simulation results of 4-2 Adder Compressor

#### **IV. CONCLUSION**

In this work power-efficient state-of-theart approximate DCTs with efficient adder compressors. The approximate DCT implementations exploited combinations of 4 to 2, and 8 to 2 adder compressors. They were compared with both approximate DCTs with the (+) operator from the tool. A synthesis methodology enabled the



power analysis with vectors of true images. Results show that both approximate DCTs using adder compressors are able to reduce both area, and power, when compared with the same DCT architectures with the (+)

operator. As the entire 2-D DCT is implement by using 2 stages of 1 D DCT, the total of twelve and ten 4-2 adder compressors are used correspondingly. The approximate DCT solutions, with adder compressors, minimizes both cells area and power consumption with good overall quality images.

#### REFERENCES

[1] F. Bayer, R. Cintra, A. Madanayake, and U. Potluri, "Multiplierless approximate 4-point DCT VLSI architectures for transform block coding," Electronics Letters, vol. 49, no. 24, pp. 1532–1534, November 2013.

[2] U. Sadhvi Potluri, A. Madanayake, R. Cintra, F. Bayer, S. Kulasekera, and A. Edirisuriya, "Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions," IEEE Transactions on Circuits and Systems, vol. 61, no. 6, June 2014.

[3] V. Coutinho, R. Cintra, F. Bayer, S. Kulasekera, and A. Madanayake, "Low-complexity pruned 8-point DCT approximations for image encoding," in International Conference on Electronics, Communications and Computers, Feb 2015, pp. 1–7.

[4] V. Oklobdzija, D. Villeger, and S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions on Computers, vol. 45, no. 3, pp. 294–306, Mar 1996.

[5] F. Bayer and R. Cintra, "Image Compression via a Fast DCT Approximation," IEEE Latin America Transactions, vol. 8, no. 6, pp. 708–713, Dec 2010.

[6] V. Britanak, P. Yip, and K. Rao, Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Elsevier Science, 2010.

[7] C.-H. Chang, J. Gu, and M. Zhang, "Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," IEEE Transactions on Circuits and Systems, vol. 51, no. 10, pp. 1985–1997, Oct 2004.

[8] R. Gonzalez and R. Woods, Digital Image Processing. Pearson Education, 2011.

[9] D. Preethi and A. M. V. Prakash, "A Low Power VLSI Architecture for Image Compression System Using DCT and IDCT," International Journal of Engineering and Advanced Technology, no. 5, pp. 363–367, June 2012.

[10] J. Chen, U. Koc, and K. Liu, Design of Digital Video Coding Systems: A Complete Compressed Domain Approach, ser. Signal Processing and Communications. CRC Press, 2001.

#### Author's Profile:

**N.Venkata Pradeep** has received the B.Tech (ECE) degree from SITS, Kadapa in 2015 and persuing M.Tech (VLSISD) in SITS, Kadapa, AP, India.





ISSN: 2320-1363

**Mr. K. Bala** is currently working as an Associate Professor in ECE Department, Srinivasa Institute of Technology and Science, Ukkayapalli, Kadapa, India. He received his M.Tech from Sri Kottam Tulasi Reddy Memorial College of Engineering Kondair, Mahaboobnagar, A.P, India.



